-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated the Codebase to use pathlib instead of using os.path and path.py #208
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #208 +/- ##
=====================================
Coverage 0.76% 0.77%
=====================================
Files 15 15
Lines 1559 1556 -3
Branches 336 336
=====================================
Hits 12 12
+ Misses 1547 1544 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a very good first PR.
Which docker command did you ran? (it is pretty hard to run a full scrape and be sure that you exercise all code branches, at least it takes time)
I made some inline comments.
In addition to them, I would like:
- please choose between the
/
operator or thejoinpath
function to combine paths, for consistency. Here you use both, and I find it does not help to read the code. Except if there are any good reasons to sometime use one and sometime prefer the other, but this is not obvious for me - please add an entry to the ChangeLog in the Unreleased section (you will probably need a new "Changed" subsection as well, there is none for now)
Some other remarks (this is not something to worry too much about, this is just our way of working):
- please always add the "Fix: #xxx" text in your first comment, so that the issue is automatically closed when PR is merged
- please assign a reviewer once PR is ready (when in doubt, ask who is the right person, but I'm the reviewer of all openZIM Python scrapers) ; this is the trigger for my action, no need to notify anywhere else
- PR is not ready if CI is failing ; if it is not clear for you why CI is failing, ask for help
- this is more a general remark, because here it was hard to understand what was going on, there was a nasty issue in
main
branch, so I fixed it myself
- this is more a general remark, because here it was hard to understand what was going on, there was a nasty issue in
- much more details on all this in https://github.com/openzim/overview/wiki/openZIM-workflow
Nota: black formatting is not OK as well ; you should probably:
|
…ct repeatedly. Changed snippets where src, dst, opff, download_cache are converted to Path objects. Removed unused import 'os' from entrypoint.py
…te paths maintaining consistency. Removed unused import 'os' from urls.py and rdf.py. Changed elif condition that checks if parser.title is empty string coz ruff gave error
Thank you for the feedback. I made all the changes you requested and also ensured that the formatting is perfect by enabling pre-commit. Kindly review the PR. |
I really like what you did, good job. |
Still did not had time to start a test run, sorry about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure that a full run (e.g. python src/gutenberg2zim
) passes before requesting again a review.
Ziming a single book does not work:
python src/gutenberg2zim --books 84
Closed for inactivity, feel free to reopen if you have new changes to submit. |
Fix #195
Make sure consistency is maintained by using
pathlib.Path
to handle all file operations.After the changes I ran the docker container which worked perfectly fine.